Search CORE

56 research outputs found

Approximate Membership for Regular Languages modulo the Edit Distance

Author: Lemay Aurélien
Ndione Antoine,
Niehren Joachim
Publication venue: 'Elsevier BV'
Publication date: 15/02/2013
Field of study

International audienceWe present a probabilistic algorithm for testing approximate membership of words to regular languages modulo the edit distance. The time complexity of our algorithm, which is independent of the size of the input word, is polynomial in the size of the input automaton and the inverse error precision. All previous property testing algorithms for regular languages, whether they consider approximations modulo the Hamming distance or the edit distance with moves, run in exponential time if not fixing one of these parameters

HAL - Lille 3

INRIA a CCSD electronic archive server

Learning n-ary Node Selecting Tree Transducers from Completely Annotated Examples

Author: Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

International audienceWe present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (NSTTs), that are known to capture the class of MSO-definable n-ary queries. Despite of this highly expressive, we show that n-aryy queries, selecting a polynomially bounded number of tuples per tree, represented by deterministic NSTTs can be learned from polynomial time and data while allowing for efficient enumeration of query answers. An application to wrapper induction in Web information extraction yields encouraging results

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Efficient Inclusion Checking for Deterministic Tree Automata and XML Schemas

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: 'Elsevier BV'
Publication date: 01/11/2009
Field of study

Special issue of LATA'08.International audienceWe present algorithms for testing language inclusion L(A) ⊆ L(B) between tree automata in time O(|A| |B|) where B is deterministic (bottom-up or top-down). We extend our algorithms for testing inclusion of automata for unranked trees A in deterministic DTDs or deterministic EDTDs with restrained competition D in time O(|A| |Σ| |D|). Previous algorithms were less efficient or less general

HAL - Lille 3

Elsevier - Publisher Connector

INRIA a CCSD electronic archive server

gMark: Schema-Driven Generation of Graphs and Queries

Author: Advokaat Nicky
Bagan Guillaume
Bonifati Angela
Ciucanu Radu
Fletcher George H. L.
Lemay Aurélien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/11/2016
Field of study

Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.Comment: Accepted in November 2016. URL: http://ieeexplore.ieee.org/document/7762945/. in IEEE Transactions on Knowledge and Data Engineering 201

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

HAL Clermont Université

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Schema-Guided Induction of Monadic Queries

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

International audienceThe induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML

HAL - Lille 3

INRIA a CCSD electronic archive server

Identification of biRFSA languages

Author: Latteux Michel
Lemay Aurélien
Roos Yves
Terlutte Alain
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

International audienceThe task of identifying a language from a set of its words is not an easy one. For instance, it is not feasible to identify regular languages in the general case. Therefore, looking for subclasses of regular languages that can be identi?ed in this framework is an interesting problem. One of the most classical identi?able classes is the class of reversible languages, introduced by D. Angluin, also called bideterministic languages as they can be represented by deterministic automata (DFA) whose reverse is also deterministic. Residual Finite State Automata (RFSA) on the other hand is a class of non deterministic automata that shares some properties with DFA. In particular, DFA are RFSA and RFSA can be much smaller. We study here learnability of the class of languages that can be represented by biRFSA: RFSA whose reverse are RFSA. We prove that this class is not identi?able in general but we present two subclasses that are learnable, the second one being identi?able in polynomial time

HAL - Lille 3

Elsevier - Publisher Connector

INRIA a CCSD electronic archive server

Efficient Inclusion Checking for Deterministic Tree Automata and DTDs

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2008
Field of study

International audienceWe present a new algorithm for testing language inclusion L(A) ⊆ L(B)L(A) between tree automata in time O(|A| |B|) where B is deterministic. We extend this algorithm for testing inclusion between automata for unranked trees A and deterministic DTDs D in time O(|A| |Σ| |D|). No previous algorithms with these complexities exist. A journal extension is available at http://hal.inria.fr/inria-00366082

HAL - Lille 3

INRIA a CCSD electronic archive server

Query Induction with Schema-Guided Pruning Strategies

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: Microtome Publishing
Publication date: 01/01/2013
Field of study

International audienceInference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schema-guided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Learning Top-Down Tree Transducers with Regular Domain Inspection

Author: Boiret Adrien
Lemay Aurélien
Niehren Joachim
Publication venue: HAL CCSD
Publication date: 05/10/2016
Field of study

International audienceWe study the problem of how to learn tree transformations on a given regular tree domain from a finite sample of input-output examples. We assume that the target tree transformation can be defined by a deterministic top-down tree transducer with regular domain inspection (DTOPi:reg). An RPNI style learning algorithm that solves this problem in polynomial time and with polynomially many examples was presented at Pods'2010, but restricted to the case of path-closed regular domains. In this paper, we show that this restriction can be removed. For this, we present a new normal form for DTOPi:reg by extending the Myhill-Nerode theorem for DTOP to regular domain inspections in a nontrivial manner. The RPNI style learning algorithm can also be lifted but becomes more involved too

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Sublinear DTD Validity

Author: Lemay Aurélien
Ndione Antoine Mbaye
Niehren Joachim
Publication venue: HAL CCSD
Publication date: 02/03/2015
Field of study

International audienceWe present an efficient algorithm for testing approximate DTD validity modulo the strong tree edit distance. Our algorithm inspects XML documents in a probabilistic manner. It detects with high probability the nonvalidity of XML documents with a large fraction of errors, measured in terms of the strong tree edit distance from the DTD. The run time depends polynomially on the depth of the XML document tree but not on its size, so that it is sublinear in most cases. Therefore, our algorithm can be used to speed up exact DTD validators that run in linear time. We also prove a negative result showing that the run time of any approximate DTD validity tester must depend on the depth of the input tree. A long version is available here.</p

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot